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Abstract 

We propose a set of statistics Sg for detecting non-gaussianity in CMBR anisotropy 
data sets. These statistics are both simple and, according to calculations over a space 
of linear combinations of three-point functions, nearly optimal at detecting certain types 
of non-gaussian features. We apply 5*3 to the UCSB SP91 experiment and find that the 
mean of the four frequency channels is by this criterion strongly non-gaussian. Such an 
observation would be highly unlikely in a gaussian theory with a small coherence angle, 
such as standard {n = 1, Of, = .05, /i = .5, A = 0) inflation. We cannot conclude 
that the non-gaussianity is cosmological in origin, but if we assume it due instead to 
foreground contamination or instrumental effects, and remove the points which are clearly 
responsible for the non-gaussian behavior, the rms of the remaining fluctuations is too 
small for consistency with standard inflation at high confidence. Further data are clearly 
needed however, before definitive conclusions may be drawn. We also generalize the ideas 
behind this statistic to non-gaussian features that might be detected in other experimental 
schemes. 
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1. Introduction 



Many experiments, current and proposed, are dedicated to measuring fluctuations in 
the Cosmic Microwave Background Radiation (CMBR). These measurements promise a 
strong experimental test of theories of structure formation in the early universe, as each 
theory predicts a distinct magnitude and form for CMBR fluctuations. In inflationary 
models, the structure generation mechanism is linear, resulting in a gaussian pattern of 
fluctuations, completely characterised by its power spectrum (see e.g. Efstathiou 1990). 
By contrast, in theories based on symmetry breaking and fleld ordering (e.g. cosmic strings 
and textures) nonlinear dynamics lead to a non-gaussian anisotropy pattern, due in part to 
horizon-sized topological defects at the epoch of last scattering (Kaiser & Stebbins 1984; 
Turok & Spergel 1990; Bennett & Rhie 1992; Coulson, Pen, & Turok 1993; Pen, Spergel, 
& Turok 1993). 

CMBR measurements have not yet discriminated among difi'erent structure formation 
theories to the extent that one might have hoped. This is in part because the measurements 
are still far from perfect. COBE has a low signal to noise ratio and large angular smoothing 
scale (Smoot et. al. 1992; Ganga et. al. 1993) , while other experiments are more accurate 
but cover only a small region of the sky (e.g. Gaier et. al. 1992; Schuster, et. al 1993; 
Meinhold et. al. 1993; Gundersen et. al. 1993; Cheng et. al. 1993). But it is also 
because most theories include parameters (n, /i, il, Q,Bi A, (Tensor/Scalar)...) which can 
be adjusted to modify the power spectrum. Such adjustments do not however alter the 
more fundamental gaussian or non-gaussian character of the theories, which may be a more 
powerful discriminator. This paper is aimed at flnding statistics which focus on this basic 
question. Other recent papers which propose statistical tests for non-gaussianity are Luo 
& Schramm (1993) and Moessner, Perivaropoulos & Brandenberger (1993). 

We are attempting to extract information from very small data sets, obtained in very 
difficult experiments. This is of course quite hazardous: it is unlikely that the idealized 
assumptions we shall make about the experimental errors are correct. Any effect we see 
may well be due to foreground sources or systematic instrumental effects, rather than non- 
gaussian cosmology. Nevertheless it is an interesting exercise to see how much may be 
learned, in principle, from experiments of the type currently being undertaken. And this 
may also serve as a guide to what kind of experimental effort would be most informative 
in the future. At the very least, we can make rigorous a process which is often performed 
by eye: the identiflcation of data points which are inconsistent with gaussian theories and 
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must be thrown out as contaminated if these theories are to be beheved. 

The various non-gaussian field ordering theories predict diff'erent characteristic forms 
on the microwave sky - for example, linelike discontinuities for strings, hot and cold spots 
for textures. However they all predict regions of sharp gradient, separated by a character- 
istic scale of order the horizon at last scattering, from one to a few degrees depending on 
the reionisation history of the universe (Pen et. al. 1993; Coulson et. al. 1993). In this 
paper we shall discuss statistics which are sensitive to degree scale large-gradient regions, 
and use them to discriminate between gaussian and non-gaussian theories. 

We shall follow tradition and use as our canonical 'straw man' gaussian theory the 
'standard' inflationary model, with parameters n = 1, h = .5, Q = 1, Qb = -05, A = 0, 
and negligible tensor mode contribution. This theory has a coherence angle of order 15', 
substantially smaller than the scales degree scale experiments probe. The results we get are 
similar to those obtained assuming uncorrelated gaussian noise at each point differenced 
on the sky. So while we shall find rather strong evidence against such theories from the 
UCSB SP91 data, we expect that the constraints would be weaker for gaussian theories 
with a large coherence angle, such as inflationary theories where there is a large tensor 
mode contribution (Crittenden et. al. 1993). 

As a concrete example of a degree scale CMBR measurement, we shall analyse the 
UCSB SP91 experiment (Gaier et. al. 1992; Schuster et. al. 1993). This is a 'single differ- 
ence' experiment; to generate a single data point, the beam moves in a sinusoidal pattern, 
with the antenna temperature integrated antisymmetrically. The result approximates the 
flrst spatial derivative of the fluctuations. A set of results consist of nine to flfteen data 
points (temperature differences), on an arc of constant declination on the sky with 2.1 de- 
grees separation between points. To correct for atmospheric and other drifts, a best-flt line 
is removed. The Schuster et. al. (1993) data set currently has the lowest error per pixel 
(5 X 10~^ for the four channel average) reported for any CMBR anisotropy measurement. 
Other ground-based experiments share many of these features, although some integrate 
their intensities in such a way as to approximate the second or third derivative of the 
fluctuations rather than the flrst and are called double- or triple- difference experiments. 
Depending on how many spatial derivatives an experiment takes, a gradient region will 
leave certain 'signature' forms on the data, as shown in Figure 1. 

It might seem that a sharp gradient would span too few points to make these char- 
acteristic patterns visible, especially for higher derivatives with complicated signatures. 
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However, the derivatives are taken by sweeping the beam on the sky or by combining data 
from adjacent instrument positions. These methods give rise to an instrument response 
function broad enough that the characteristic signatures will be visible even for an infinitely 
sharp discontinuity in the CMBR. 

A final caveat should be added regarding our use of 'classical' confidence intervals, as 
opposed to Bayesian measures of the relative probability of different theories. These are 
notoriously difficult to interpret. We have done so mainly out of expedience - it is far easier 
to construct realisations of gaussian theories than for nongaussian theories. When enough 
non-gaussian maps are available, the following procedure may be preferable. If one is 
comparing theory A to theory B, the change in the relative odds following a measurement 
of a continuous observable X is given by the 'Bayes factor' 



where probability of observing X in the interval dX is P{X\A)dX according to theory A 
and similarly for theory B. As we mention in the conclusions, preliminary results indicate 
that according to this, more simply interpretable test, the nongaussian theories may be 
significantly favored by the UCSB SP91 data. 



All current theories of the origin of structure produce fluctuations in the form of a 
stationary random process. Any such process may be completely characterized by the set 
of all n-point correlation functions. On a one-dimensional data set these may be estimated 
as 



We shall assume that the data set of interest has zero mean, as in UCSB SP91, 
where a best-fit line is substracted as explained above. We shall adopt the convention of 
normalizing the data set to unit variance, in order to concentrate on the shape and not the 
amplitude of the signal. 

The one-point function Cq is identically zero, and the two-point function Coo is iden- 
tically one (because of our unit- variance convention), and so the first nontrivial correlation 
is Coi, the two point correlation function at scale i. This does contain information about 



2. Choosing Statistics 




where, by convention, < ri < r2 < ■ ■ ■ < r. 
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the spectrum of fluctuations — it is the fourier transform of the power spectrum — but it 
is no help in distinguishing gaussian from non-gaussian data. 

The three-point function Coij is a much more promising test for non-gaussianity. For 
gaussian noise, < Coij > = for aU i and j (although for finite data sets there will be 
random fluctuations about the expected value of zero.) For non-gaussian skies we expect 
non-zero three-point functions. For example, consider the three-point function Cqoo- A 
data set containing a positive 'bump' has several outlying high points, and thus positive 
skewness. A downward-pointing bump will lead to negative skewness. Either way, the 
high absolute value of the skewness could be used to distinguish a data set drawn from a 
gaussian model from one drawn from a region containing a bump (which, from fig. 1, is a 
likely signature of non-gaussianity in a single-difference experiment.) 

As we shall show, skewness is not a very powerful statistic for reliably detecting non- 
gaussianity in a noisy experiment. We can improve on its performance in two ways. 

First, if the CMBR is non-gaussian, a 'bump' marking a gradient region may span 
two or more adjacent points. Even if the region of steep gradient on the sky were infinitely 
sharp, it would register in at least two data points because of the instrument's response 
function. Skewness fails to take advantage of these correlations among closely neighboring 
points (obviously, since skewness is invariant under spatial scrambling of the data points.) 
We can remedy this shortcoming by combining several adjacent points; for example, to 
look for bumps of width on the order of q, define 

N-q+l , 

^-(7V-g + l) q 

This statistic responds much more sharply to several adjacent high points than to the same 
number of high points scattered randomly over the data set, so it better distinguishes actual 
non-gaussian bumps from noise. The absolute value of Sq will be near zero if no bump 
exists and strongly non-zero if there is a single bump. 

Of course, Sq is not equal to a simple three-point function, but except for the treatment 
of points near the edges of a data set, it is equivalent to a linear combination of three-point 
functions. We will expand on this point later, when we show that S's is nearly optimal, 
among all linear combinations of a certain set of three-point functions, at detecting bumps 
of width near three. 

A second way to improve the performance of almost any statistic which detects bumps 
in a data set is to apply it not to the entire data set but to shorter subsets or 'windows' of 
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length L < N. The final statistic Sq-L is defined as the absolute value of the most extreme 
(positive or negative) Sq found in any of the N — L + 1 possible window positions. Use 
of these 'sliding windows' improves the statistic's performance for several reasons. Most 
importantly, it prevents a positive gradient region in one part of the data from cancelling 
a negative gradient in another region (by isotropy, both signs are equally likely to occur.) 
The procedure also reduces the effects of noise on the statistic's probability distribution 
by concentrating on only a few points around each gradient signature. 

If the data set contains a bump with some number p of adjacent 'strong' (highly 
positive or highly negative) points, wc expect the best results when L ^ p + 2{q— 1). This 
allows the window to contain every group of q adjacent points which includes at least one 
'strong' point, and no groups of q points with no 'strong' point. Sq is most sensitive to 
bumps with about q 'strong' points. We generally choose p = q — 1, so 

L = 3(g-1) (3) 

We will show that both in monte carlo runs and on actual experimental data, statistics 
perform much better on sliding windows of about this scale than on entire data sets. For 
long data sets, one might better consider the probability distribution of Sq over window 
positions, rather than the maximal value, to reduce sensitivity to a few extreme points. 

Our favored choice of statistic will be Ss-e, as q = 3 will be sensitive to bumps only 
slightly wider than the experimental response function, and can thus detects gradient 
regions whose width (relative to the two-degree scale set by the experiment) is fairly low 
but non-zero. Equation (3) then sets the window length of L = 6. 

3. Monte Carlo Results 

To test the relative power of different statistics, we devised a monte carlo technique 
based on the UCSB SP91 experiment. A large number of trial data sets {xi : 1 < i < N} 
were generated. Typically, = 13 to match the UCSB SP91 experiment. Half these data 
sets were generated from a 'null' gaussian model and half from a 'bumpy' non-gaussian 
model. 

The null data sets could be generated in either of two ways. G. Efstathiou (private 
communication 1993) generously provided 1000 sets of 13 points, based on his computer- 
generated 'standard inflation-plus-CDM' skies and his simulation of the properties of the 
UCSB SP91 experiment. Alternatively, null sets could be generated by simply drawing N 
independent, random points from a gaussian distribution. These methods returned similar 
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results (in that most interesting statistics S{{xi}) calculated from the N points had similar 
distributions for the two null models.) The statistics are apparently not greatly affected 
either by correlations arising from the CMBR's power spectrum (no great surprise, since 
the two-degree scale of this experiment is on the low-frequency side of the power spectrum 
for this theory, and taking a spatial derivative further shifts the spectrum toward high spa- 
tial frequencies) or by correlations arising from the instrument's response function (again 
no surprise; the signature that we're looking for is symmetric and thus orthogonal to the 
antisymmetric response function. Were we searching for point sources rather that discon- 
tinuities, response-function-induced correlations would be a more powerful confounding 
factor.) 

The other half of the data sets were generated from a non-gaussian 'bumpy' model. 
A single bump, centered at some random location no within the data set, was laid down: 

^ _ p-0((n-nof 

where we typically used a = 0.5, corresponding to a bump with fuU-width-half-max of 2.8 
pixels. Incidentally, no is not necessarily an integer; the center of the bump can lie between 
pixels. Independent gaussian noise was then added to each point to simulate instrument 
noise. In both the null and 'bumpy' models, the each data set {xi} was normalized to zero 
mean and unit variance. 

For each statistic S which we want to investigate, we can calculate probability distri- 
butions of S in the 'null' and 'bumpy' models. If 5" is a powerful detector of non-gaussianity, 
there should be little or no overlap between the two distributions. Figure 2 shows these 
distributions for our favorite statistic, Ss-^-, using a signal-to- noise ratio of 1.25 (about the 
same level as seen in the UCSB SP91 data set). There is indeed very little overlap: the 
value of Ss-Q calculated on a 'bumpy' data set typically exceeds the values of all but a small 
fraction of the null sets. This "small fraction" varies from one 'bumpy' set to another, but 
its average value is 1.2% (from now on, we'll refer to this as "a mean significance of 1.2%" .) 

The performance degrades if instead of using sliding windows, we simply calculate 
S3 on the entire data set at once; the mean significance rises from 1.2% to 3.4%. For 
comparison, figure 3 shows the distributions of absolute values of skewness (calculated in 
sliding windows of width 6) of data sets from the null and 'bumpy' models. The overlap is 
tremendous; at this noise level, skewness could never reliably distinguish the two models. 
Performance is even worse if we calculate skewness on the whole data set instead of on 
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a sliding subset. Evidently, S3 is a much more powerful detector of non-gaussian bumps 
than is skewness. 

To test our assertion that each statistic Sq is most sensitive to bumps of width of about 
q, we performed monte carlo runs like those described above for a variety of 'bumpy' models 
with bumps of different widths. For each model, we measured the average significance 
obtained using S'g.3(q_i) for q = 1, ..5. The results are shown in figure 4, which plots 
mean significance vs. fuU-width-half-max of the bump model for each of the five statistics. 
As expected, each statistic Sq reaches its maximum power (lowest mean significance) for 
bumps of fuU-width-half-max near q (or slightly higher.) 

4. Optimal Statistics 

We have justified the statistic Sq.^q_i-) by an incremental process, starting with skew- 
ness, the simplest detector of non-gaussianity, and modifying it to counter its obvious 
shortcomings. Our monte carlo results showed that the resulting statistic is a much better 
detector of non-gaussian 'bumps' than is skewness, but we would like to go furthur and 
show that it is optimal or near-optimal for this job, at least over certain classes of related 
statistics. 

The procedure of calculating Sq.^q^i^ can be separated into three steps. First we 
convolve the data set with a square tooth of width q (a function equal to one at q adjacent 
points, and zero elsewhere). Next we take the third power of the convolved data points. 
Finally we add the results for each connected subset or 'window' of length 3(g — 1) within 
the data set, and take the most extreme value of Sq as our final statistic S'g.3(g_i). For 
each of these three steps, we can investigate whether a modification of the procedure would 
produce stronger results. 

4.1 Optimal Choice of Convolution Function 

Our choice to convolve the data with a square tooth function amounts to a filtered 
deconvolution about the gradient signature we are searching for (in this case a bump), 
with the high-frequency components suppressed. It is not better to use the rigorous decon- 
volution function of the signature we are seeking; this method is notoriously vulnerable to 
high frequency noise. But it is worthwhile to see whether convolving the data with some 
other function, rather than the arbitrarily chosen square tooth, would produce a better 
statistic. 

As mentioned before, Sq is nearly equivalent to a linear combination of several three- 
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point functions. Except for its treatment of points near the window edges, ^3 is propor- 
tional to 

C'ooo + 2(Cooi + Coil) + 2C012 + {C002 + C022) (4) 

To the extent that this approximation holds, the search for the optimal convolution 
function is equivalent to a search for the optimal linear combination of three-point func- 
tions. We define the generalized three-point function Sqs by: 

Sg3 = Cooo + ^(Cooi + Con) + u{Coi2) + v{Coq2 + C022) (5) 

Cooi and Con share the same coefficient for reasons of symmetry, as do C002 and Co22- 
There is no coefficient before Cooo because an overall multiplicative constant does not affect 
a statistic's ability to distinguish between distributions of different shapes. 

Sg3 includes all six of the three-point functions which involve no more than three 
adjacent points at a time. Wider-ranging three-point functions, such as C013, are not 
included because we are attempting to generalize S^, which searches most powerfully for 
bumps spanning 2 or 3 points. 

To estimate the optimal coefficients t, u, and v, the procedure is as follows. We first 
adopt two simple analytic models of null (gaussian) and 'bumpy' (non-gaussian) distribu- 
tions. We then calculate the mean of Saa for the bumpy model, and its mean and variance 
for the null model. We define the 'bump-resolving power' R as the distance, measured in 
standard deviations of the null model, between the means of the null and bumpy models: 

^ _ < Sg3 >bump — < Sg3 >null 
V < ^G3 >null - < Sg3 >null 

Finally we maximize R as a function of t, u, and v. The quantity R is not the most 
accurate measure one could think of, but is at least straightforwardly calculable. 

To simplify the calculation, we work in an infinitely long window of length A/" — > 00, 
instead of the window of length L = 6 which we shall use in practice. This underscores the 
fact that choosing a statistic (like Sq) and choosing a window size are two separate ideas; 
the idea of sliding windows is not specific to Sq but improves the performance of almost 
any statistic. 
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4.1.1 Null Model: 

The null model consists of N points drawn from a gaussian distribution of zero mean 
and unit variance, with one important modification: each data set is zero mean. If {yi} 
are a set of independent points drawn from a gaussian distribution, 



This sounds like a trivial change, especially for large data sets which tend to have means 
very close to zero anyway, but the explicit normalization makes a surprisingly large differ- 
ence in the calculation even as A/" ^ oo. We do not explicitly normalize each set to unit 
variance, because it can be shown to make no difference in the A?" — > oo limit,. 

In order to calculate R from equation (6), we need to know < Sg3 > and < Sq^ > 
for this model. < Sqs > is clearly zero, and using other symmetry properties. 



for a gaussian distribution with unit variance. 

Expectations such as < x^x^ > can be calculated using Wick's theorem, starting with 
the fundamental 2-point expectation < XiXj >= Sij — N~'^. The term is due to the 
normalization of the data {xi} to zero mean. The results are: 




< -5^3 > = < C'OOO > +2*' < C^Ol > +^^' < ^"012 > +2^^' < Co'o2 > 



So we need to calculate expectations such as 




TV TV 



i=l j=l 
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-1 



All other terms in the expression for < Sq^ > are zero, so 



< sis >null^ (6 + + u^ + 4v^)N 



-1 



as N 



oo 



(7) 
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4.1.2 Bumpy Model: 

To represent non-gaussian, 'bumpy' data sets we take 



Xn = ayn + hrun 

where the 'noise' y-a is drawn from a zero mean, unit variance normal distribution, and 
is the underlying 'bumpy' model: 



1 

m„, = g (e-'^i^-^'^f - where c=^^ 9 = (^^^ 



The bump center uq is chosen randomly and is not necessarily an integer. The con- 
stants g and c are chosen so that, as — > oo, m„ will also have zero mean and unit 
variance (when averaged over all uq). The purpose of a and b is to set the signal-to-noise 
ratio: SNR = ^, and + = 1. It is straightforward to check that all terms involving 
the noise term give zero in Sgs (this would not be true for higher moments). Averaging 
over no converts all sums to integrals, which in the limit iV — > oo yield the results: 



3_i f ba 



< C'ooo >= b-^N^ ] < Cooi >=< Cooo > e 3' 



< C002 >—< C'ooo > e 3"^ < Cqoi >—< Cooo > e 

For a = 0.5 (corresponding to a bump with fuU-width-half-max of about 2.8 data 
points, about the scale we hope to detect with Sgs), 

< Sg3 >bump= b^N^ [0.6133 + 0.8789t + 0.2256w + 0.3233v] (8) 

Now we use (6) to estimate the statistic's power to distinguish gaussian from non- 
gaussian models: 

^ _ < .SV;;i >bu,np ,3 ^,0.6133 + 0.8789f + 0.225G?/ + 0.3233v 
-ft = — , — 1\ . 

y< >null V6 + W+W+4^ 

The optimal values of t, u, and v are those which maximize R; a numerical search for 
these yields the optimized three-point statistic: 

Sg3 = Cooo + 2.15(Cooi + Con) + 2.21Coi2 + 0.79(Coo2 + C022) (9) 

As we hoped, this result is similar to (4), confirming that our original combination of 
three-point functions (or equivalently, our choice to convolve the data set with a square 

11 



tooth) is probably among the most powerful methods. Since this calculation was approx- 
imate, we checked it with more precise monte carlo runs, which confirmed that no other 
no other choice of convolution function gives dramatically better results. We have settled 
on the choice of a square tooth as the best combination of simplicity and power. 

4.2 Optimal choice of power 

After convolving the data set with a square tooth of width q, Sq requires us to sum 
the third powers of the convolved data points. We should investigate whether taking 
some power other than the third would give better results. The higher the power, the 
more emphasis is given to the most extreme points in the data set (after convolution.) 
Emphasizing extreme points has the advantage of reducing the effects of noise, since it 
prevents several small bumps, caused by noise, from matching the effect of a large bump 
in the signal. The drawback is that for a real signal, the highest points will have neighbors 
which are also higher than average, since neither a physical gradient region on the sky 
nor the instrument's response function have perfectly sharp edges. High points due to 
extreme values of noise will not in general have unusual neighbors. So focusing too heavily 
on extreme points throws away information which would help distinguish a physical signal 
from noise. The optimal choice of power is that which balances these two competing effects. 

The answer is not obvious and is clearly model-dependent, so we turn again to monte 
carlo results. Consider a class of statistics 

^3;6 = (]^32) 1^ 3 ) ^1") 

These statistics are calculated in sliding windows of width 6; they differ from S^-^ only in 
the use of the p-ih. power rather than the third. We investigated their ability to distinguish 
two different 'bumpy' models from white noise. One model, described in the earlier section 
on monte carlo results, used bumps of gaussian profile with fuU-width-half-max of 2.8 data 
points. The other model used bumps of a square tooth profile with three adjacent, equally 
high points randomly placed in the data set. Gaussian white noise was added to both 
models at a signal-to-noise ratio of 1.25. For the gaussian-profile model, we expect the 
resolving strength to peak at some finite power p, while the square bumps should be best 
resolved at very high p since the argument for lower powers applies only to bumps in which 
points near the bump have non-zero expectations. 

The results are shown in Figure 5, which plots mean significance of detection vs. p. 
For detection of the gaussian-profile bumps, the optimal power was p = 3. For the square 
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bumps, higher powers are always better, as expected. Again, neither gradient regions on 
the CMBR nor instrument response functions are expected to have sharp cutoffs, so we 
view the gaussian profiles as more physically realistic than the square ones, and continue 
to use p = 3. However, the results show that p = 3 is only slightly better than other 
nearby choices, so we will not hesitate to use p = 4 later in the paper when we generalize 
the statistic to search for other types of gradient signatures (because even powers will be 
more convenient than odd ones). 

Also, we should note that S'g^g becomes very simple as p ^ oo. Our statistic is 
then equivalent (in its relative ranking of different data sets, which is the only thing that 
matters) to simply convolving the data with a square tooth of width three, and then 
choosing the most extreme point. The 'sliding window' becomes irrelevant in this limit. 
Readers who feel that this extra simplicity is worth sacrificing some power may prefer to 
use this straightforward procedure. 

4.3 Optimizing Window Size 

The final choice that we have made is to use windows of length L = S{q—1) (equation 
(3)). There is not much of interest to say about this choice. We gave a rough justification 
earlier, and monte carlo runs confirm that it is the best or nearly the best length for a 
wide range of q (when searching for bumps of full- width- half- max near q.) 

5. Experimental Results 

We focus on a run of the UCSB SP91 experiment which observed 13 points in four 
frequency channels (Gaier et. al. 1992). Cosmological fluctuations should be frequency- 
independent, so we can average the four channels to better distinguish cosmological fluc- 
tuations from instrument noise and, possibly, from astrophysical and atmospheric effects. 
The mean of the four channels is shown in figure 5. 

We applied the statistic Ss;6 to data from each of the four channels and to their mean. 
We compared the results to G. Efstathiou's 1000 simulations of UCSB SP91's view of 
standard infiation skies (after adding gaussian noise at the estimated experimental level, 
and removing a best-fit line, as was done to the actual UCSB SP91 data.) Channels 1 and 
2 (the two lowest-frequency channels, covering 25-27.5 GHz and 27.5-30 GHz respectively) 
appear highly non-gaussian. Both achieve 0.1% significance (only 1 of the 1000 infiationary 
skies gave as large a value of 5*3.) The average of all four channels does nearly as well, at 
0.2% significance. Channels 3 and 4 were not conclusively non-gaussian. 
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By contrast, if we had used pure skewness, we would find the mean of the four channels 
non-gaussian at only 5% significance. If we had used skewness on the whole data set, 
instead of with sliding windows, we would conclude the mean of the four channels to be 
non-gaussian at only 11% significance. 

We chose S3 (rather than some other Sq) as the preferred statistic for analyzing data 
sets because we expect the perceived size of the gradient regions to be not much larger 
than the lower limit set by the instrument response function (two or three pixels.) For 
comparison. Figure 6 shows the significance levels at which all four channels, as well as 
their mean, can be shown non-gaussian by the various {Sq}, with q ranging from 1 to 13. 
S3-e provides the strongest overall results, although S2;3 and 5'4;9 both outperform it on 
individual channels. 

These results show that the UCSB SP91 data is strongly non-gaussian, but non- 
gaussian data does not necessarily imply non-gaussian cosmology. The data sets contain a 
visible spike spanning about 2 pixels (clearly visible in figure 5.) This could be the signature 
of a sharp gradient generated by a non-gaussian cosmological model, but there are several 
other possibilities. These include galactic foreground sources, extragalactic but noncosmo- 
logical sources (unlikely; such sources could not easily match the spatial structure of the 
data), or a systematic instrumental effect such as sidelobe pickup. With UCSB SP91's 
limited range of frequency (25-35 GHz) there is not enough spectral information to reli- 
ably distinguish astrophysics from cosmology (i.e., by fitting to the spectra of synchrotron 
or bremsstrahlung radiation). However, if a gaussian theory with small correlations on 
two-degree scales, like standard inflation, is to be believed, we must assume that the signal 
in points number 7 and 8 is non-cosmological, and remove those points from the data set. 
The fluctuations of the remaining eleven points may then be used to impose constraints 
on the cosmological fluctuations. We removed a best-flt line from the remaining points, 
because the line already removed from the scan must be assumed invalid if two points 
were contaminated. The remaining points have a quite reasonable chi-squared of 9.9, quite 
reasonable for 9 degrees of freedom. We then compared the r.m.s. to those of the standard 
inflation simulated data sets (with points 7 and 8 likewise removed, and a new best-flt line 
subtracted from the remaining points). After this procedure, only 1 of the 1000 simulated 
data sets had an r.m.s. as low as that of the mean of UCSB SP91's four channels. We 
performed the same procedure, removing points 6, 7, 8, and 9 (since points near the bump 
may also be suspect) and found the UCSB SP91 data was quieter than all but 6 of the 
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1000 simulations. 

If we are to believe the UCSB SP91 data, the standard inflation theory is caught on 
the horns of a dilemma. If points 7 and 8 of the UCSB SP91 data are of cosmological origin, 
the shape of the data is highly non-gaussian and thus inconsistent with the theory. If the 
two points are contaminated, the measured r.m.s. is too low at high confidence. Other 
gaussian models can be tested in a similar way. Before claiming that we have rejected any 
cosmological model, we must wait to see if these methods demonstrate non-gaussianity in 
other experiments. 

6. Other Experiments 

'Sq:3{q-i) Can bc applied without modification to any single-difference experiment. We 
recommend q — 3 unless the experiment has a very short distance (well under a degree) 
between data points, in which case one should try several larger values of q to search for 
high-gradient regions typical of field ordering theories. 

For double- and triple-difference experiments, the characteristic 'bump' marking gradi- 
ent regions will be replaced by more complicated signatures representing higher derivatives 
of this sudden gradient (as seen in Figure 1). The ideas developed here, with some modifi- 
cation, should apply to these experiments as well. Recall that the procedure of Sq involves 
convolving the data set with a 'square tooth' of width q, then adding the third powers of 
the convolved data points within each sliding window (see the section on optimal statistics 
for a discussion of these steps.) For more complicated signatures, we need to modify both 
the convolution function and the choice of the third power. 

The square tooth was a natural choice for the convolution function because it approx- 
imates the 'bump' signature form which we are looking for. We will continue to convolve 
with a function or width q that approximates the signature form being sought. Unfortu- 
nately, extra complications arise when the gradient-signature being sought crosses zero (as 
it does for all but single-difference experiments.) The statistics {Sq} designed to search 
for bumps with width of about q data points, were fairly powerful for a wide range of 
other widths as well. But when searching for a signature which changes sign, a mismatch 
of widths can leave the statistic searching for a form which is orthogonal to that actually 
present, thus cancelling the result. The more zero-crossings the signature contains, the 
more critical it becomes to use an accurate width. This requires knowledge not only of the 
instrument response function but also of the expected width of the gradient region on the 
sky. The search for non-gaussianity becomes uncomfortably theory-specific. 



15 



Even if the width is chosen perfectly, the convolution of a signature function with an 
approximation of itself will yield several adjacent points which are strongly non-zero but 
alternate in sign as rapidly as the signature itself does. If we added the third (or any 
odd) power of these convolved points, they would cancel one another. A simple solution 
is to raise the convolved data to the fourth power rather than the third, suffering a slight 
decrease in resolving power but gaining robustness. For example, in a third-derivative 
experiment, such as Dragovan et. a/.'s Python (Dragovan et. al. 1993), a sharp gradient 
might be best resolved by a statistic of the form 



The coefficients of Xj, Xj+i, and are fairly obvious guesses, matched to the 

expected form of the data (Figiuc 1). To detect wider gradient regions or other signature 
forms, simply use an appropriate approximation to the shapes shown in Figure 1; for 
example, a bump of width four would respond to 



3 

while in a double-difference experiment, gradient-signature regions with width of about 
three data points would respond well to 



There is no need to modify the 'sliding window' scheme as we look for more intricate 
signature forms; windows of length Z{q — 1) still work quite well. 

We have performed Monte Carlo simulations which confirm that statistics such as 
these are much more powerful than simple skewness or kurtosis at detecting the signatures 
of sharp gradient regions in double- and triple-difference experiments. However, there 
is another option. Double- and triple-difference results are often constructed in stages, 
starting with single-difference data and combining adjacent points. It may be best to 
look for regions of sharp gradient in the original, single-difference data, where they will 
appear as simple bumps and can be detected by the comparatively robust statistics {5*^}. 
The disadvantage to this approach is that single- difference results may be more vulnerable 
to systematic errors; we cannot predict in general which approach will work best for all 
experiments. 




3 





3 
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7. Conclusions 



We have proposed a class of statistics which should be quite powerful in detecting a 
wide range of non-gaussian features in one-dimensional data sets. Their greatest potential 
vulnerability is that gaussian data with significant correlations on the scale of the spacing 
between data sets may be hard to distinguish from non-gaussian forms. This could occur 
in cosmological models with unusually strong power spectra at large angular scales (Crit- 
tenden et. al. 1993), in experiments with a short distance between data points, or in cases 
where correlations introduced by the instrument's response function are similar in form 
to the non-gaussian 'signature' being sought. Any of these factors may make conclusions 
harder to draw, but they should not lead to false rejections of a gaussian theory, as long as 
the null data sets accurately model the gaussian theory and the instrument's properties. 

Our analysis of the UCSB SP91 experiment indicates that the CMBR anisotropy is 
inconsistent with 'standard' inflation. If, as the theory predicts, the CMBR fluctuations 
are gaussian, the nongaussianity of the data must be due to foreground contamination. 
If we discard the contaminated points (those responsible for the non-gaussian shape) we 
find a level of fiuctuations significantly smaller than the theory predicts. One might worry 
about possible 'conspiracies' here - if a high fluctuation in the CMBR actually contributed 
to the nongaussian 'bump', we would throw it away when we removed the contaminated 
points. Could this not bias us towards low amplitudes in the remaining points? No, not 
in a theory like standard inflation, because the remaining points are so weakly correlated 
with the removed points, that they still provide a fair sample. Of course, like any other 
conclusions drawn from the still very limited data on CMBR anisotropy, our conclusion 
requires confirmation from further experimental results. 

It is also important to check whether current non-gaussian theories are 'nongaussian 
enough' to account for a data set like USCB SP91. We have calculated our statistic 5'3;6 
on a small number of sky maps produced by simulations of cosmic texture (Coulson et. 
al. 1993), with the preliminary result that the distribution of values of iS'3;6 is indeed 
substantially broader than that of standard inflation. Further simulation results will soon 
allow a more definite conclusion, along the lines indicated in the introduction (equation 

(I))- 

We conclude that either 
i) The CMBR fluctuations are nongaussian, or gaussian with a larger coherence angle than 
standard inflation. In either case the anisotropy pattern will hold valuable new information 
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about the mechanisms of structure formation. 

ii) The fluctuations are gaussian with a smaU coherence angle, and small, too small for the 
standard inflationary theory, but are overlaid with signiflcant foreground contamination. 

iii) The UCSB SP91 data set is not a representative sample of the microwave sky. 
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Figure 1: The characteristic 'signatures' which would result from a step function or 
region of very sharp gradient in the CMBR, for single- through triple-difference experiments 
(that is, experiments whose results approximate the first through the third derivatives of 
the CMBR intensity.) The purpose of this paper is to develop methods of detecting these 
signatures against a background of instrument and other noise. 

Figure 2: Probability distributions of the statistic Ss-e (that is, Ss as defined in (2), 
calculated in 'sliding windows' of length six) for the 'bumpy' and null models described in 
the section on Monte Carlo results. The two distributions show very little overlap, so 5'3;6 
appears to be powerful at detecting certain types of non-gaussianity. 

Figure 3: Probability distributions of skewness, calculated in 'sliding windows' of 
length six, for the same two models used in Fig. 2. There is considerable overlap; skewness 
is much less powerful than S^^e at distinguishing between data sets drawn from these two 
models. 

Figure 4: The average significance levels achieved by the statistics Sq.^q_l') for 
q = 1...5, when searching for bumps of various widths (full- width- half- max ranges from 
1 to 10.) We see that to detect bumps of width ~ t data points, it is best to choose q 
roughly equal to (or slightly lower than) t. 

Figure 5: The average significance levels achieved by the statistics S^^q defined in 
(10) as a function of the power p, when used to discriminate gaussian noise from two 
different nongaussian models. One model employs bumps of a gaussian profile, the other 
bumps of a square tooth profile. We consider the former more physically reasonable, and 
consequently adopt the power p = 3 in our subsequent analysis. 

Figure 6: A set of results from the UCSB SP91 experiment. This shows the temper- 
ature offset dT, averaged over all frequency channels, for each of 13 points separated by 2.1 
degrees. UCSB SP91 is a single-difference experiment, so these values actually represent 
(roughly) the difference between the CMBR intensities at two different points. The 'spike' 
visible at points 7 and 8 thus suggests a region of sharp gradient in the CMBR, unless the 
data are contaminated at these points. 

Figure 7: The significance levels at which each of UCSB SP91's four channels, as 
well as the mean of these channels, can be shown non-gaussian by each of the statistics 
'Sq-3{q-i) for 1 < g < 13. Ss;6 shows the best overall performance. 
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